Query Point Nearest Neighbor

نویسندگان

  • Kevin Beyer
  • Jonathan Goldstein
  • Raghu Ramakrishnan
چکیده

We explore the eeect of dimensionality on the \nearest neigh-bor" problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as di-mensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective , we present empirical results on both real and synthetic data sets that demonstrate that this eeect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this eeect does not occur. However , our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is awed, and should be modiied. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-zero probability of nearest neighbor searching

Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...

متن کامل

Nearest Neighbors Problem

DEFINITION Given a set of n points and a query point, q, the nearest-neighbor problem is concerned with finding the point closest to the query point. Figure 1 shows an example of the nearest neighbor problem. On the left side is a set of n = 10 points in a two-dimensional space with a query point, q. The right shows the problem solution, s. Figure 1: An example of a nearest-neighbor problem dom...

متن کامل

Dynamic Nearest Neighbor Queries in Euclidean Space

Given a query point q and a set D of data points, a nearest neighbor (NN) query returns the data point p in D that minimizes the distance DIST(q,p), where the distance function DIST(,) is the L2 norm. One important variant of this query type is kNN query, which returns k data points with the minimum distances. When taking the temporal dimension into account, the kNN query result may change over...

متن کامل

Processing All k-Nearest Neighbor Queries in Hadoop

A k-nearest neighbor (kNN) query, which retrieves nearest k points from a database is one of the fundamental query types in spatial databases. An all k-nearest neighbor query (AkNN query), a variation of a kNN query, determines the k-nearest neighbors for each point in the dataset in a query process. In this paper, we propose a method for processing AkNN queries in Hadoop. We decompose the give...

متن کامل

Novel Forms of Nearest Neighbor Queries in Spatio-Temporal Applications

Several types of nearest neighbor (NN) search have been proposed and studied in the context of spatial databases. The most common type is the point NN query, which retrieves the nearest neighbors of an input point. Such a query, however, is usually meaningless in highly dynamic environments where the query point or the database objects move/change over time. In this paper we study alternative f...

متن کامل

Decreasing Radius K-Nearest Neighbor Search using Mapping-based Indexing Schemes

A decreasing radius k-nearest neighbor search algorithm for the mapping-based indexing schemes is presented. We implement the algorithm in the Pyramid technique and the iMinMax(θ), respectively. The Pyramid technique divides d-dimensional data space into 2d pyramids, and the iMinMax(θ) divides the data points into d partitions. Given a query point q, we initialize the radius of a range query to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999